Genomics, Proteomics & Bioinformatics
◐ Oxford University Press (OUP)
Preprints posted in the last 30 days, ranked by how well they match Genomics, Proteomics & Bioinformatics's content profile, based on 171 papers previously published here. The average preprint has a 0.29% match score for this journal, so anything above that is already an above-average fit.
Wang, B.; Wan, S.; Zhang, P.; Zhang, Y.; Wang, X.; Dong, L.; Ye, K.; Yang, X.
Show abstract
The complete assembly of the human Y chromosome remains a challenge due to its highly repetitive and complex structure. While complete telomere-to-telomere (T2T) assemblies have been generated for a few individuals, such high-quality resources for East Asian populations, particularly for well-characterized multi-omics reference cohorts, are still scarce. The Chinese Quartet, comprising monozygotic twin daughters and their parents, is a premier reference material for genomic studies, yet a T2T-level Y chromosome assembly for this pedigree was lacking. Here, we present a complete, gapless T2T assembly of the Y chromosome (designated CQ-chrY) from the father of the Chinese Quartet. This assembly was generated by integrating Oxford Nanopore ultra-long reads, PacBio HiFi reads, and Hi-C data, resulting in a sequence of 61.88 Mb. The assembly shows exceptional base accuracy (QV = 51.09) and structural completeness (GCI = 100; CRAQ AQI = 95.217). We completely resolved the 33.52 Mb Yq12 heterochromatic region and annotated 164 protein-coding genes and 51.03 Mb (82.47%) of repetitive sequences. This CQ-chrY assembly represents the third complete Chinese Y chromosome and fills the last gap in the T2T assemblies of the Quartet family, providing an invaluable paternal haplotype resource for expanding East Asian genomic standards and for studies on Y chromosome structural variation and evolution.
Yao, F.; He, J.; Nyaruaba, R.; Chen, F.; Zhou, J.; Yang, H.; Wei, H.; Li, Y.
Show abstract
Microorganisms significantly influence human health, and dysbiosis of the oral microbiome plays a critical role in the development and progression of both oral and systemic diseases. This highlights the urgent need for novel therapeutics targeting specific pathogens. Here, we presented a structure-based pipeline to efficiently identify potential phage-derived periodontal lysins (LysPds) from nearly one million proteins. We predicted the structures of candidate lysins using AlphaFold2 and developed an innovative structure-based similarity network to classify them into distinct clusters, each with unique functional properties. A systematic characterization of 16 representative LysPds from 11 superfamilies revealed that over 90% demonstrated potent antibacterial activity against key periodontal pathogens. Among these, LysPd078 was identified as a promising preclinical drug candidate, effectively reconfiguring microbiome communities while demonstrating significant efficacy and safety in mouse models of periodontitis and calvarial infection. Our findings highlight the effectiveness of structure-based similarity networks in exploring vast protein spaces and underscore the potential of LysPd078 as a targeted modulating agent for the oral microbiome.
Gao, Q.; Song, Y.; Yang, Y.; Wang, S.; Ruan, X.; Liu, Z.; Guo, D.; Chen, Y.; Wang, X.; Chen, R.; Xu, H.; Lin, F.
Show abstract
In agriculture, propiconazole (PCZ) controls excessive growth in flowering Chinese cabbage but poses dietary safety risks due to residue accumulation. Therefore, identifying novel PCZ targets and breeding PCZ-free cultivars is critical for the safe production of flowering Chinese cabbage. Here, we identified three P4-ATPase flippase homologs aminophospholipid ATPase 3 (BraALA3a/b/c) in flowering Chinese cabbage that function as sensitive targets for PCZ. These proteins exhibit high binding affinity for PCZ, which directly inhibits their ATPase activity. Overexpression of the BraALA3 homologs enhanced plant growth and increased sensitivity to PCZ, whereas knockdown led to dwarfism and reduced sensitivity. Based on these findings, we identified editable active sites via protoplast-based screening. Genetic transformation of one such site yielded BraALA3a/braala3aK200T mutant lines, which displayed a dwarf and compact architecture. These findings provide a precise molecular target for developing PCZ-free germplasm in flowering Chinese cabbage through gene editing.
Choi, S.; Lee, N.; Jeon, H.; Park, J.; Kim, S.; Kim, J.-E.; Shin, J.; Moon, H.; Min, K.; Choi, Y.; Hwangbo, A.; Kim, H.; Choi, G. J.; Lee, Y.-W.; Song, D.-G.; Son, H.
Show abstract
O_LIWD40 is a highly conserved protein domain in eukaryotes, playing a critical role in various cellular process. C_LIO_LIWe conducted genome-wide functional analysis of WD40 genes in Fusarium graminearum--a phytopathogenic fungus that causes severe yield loss and mycotoxin contamination in major cereal crops. C_LIO_LIComprehensive phenome analysis of 119 WD40 gene deletion mutants across 22 distinct phenotypic traits revealed phenotypic divergence within the phenome, establishing a strong correlation between virulence and sexual reproduction. Notably, 21 "core WD40 genes" were identified, offering valuable insights into divergent biological processes. C_LIO_LIPilot interactome studies of Fgwd101 and Fgwd133 provided further insights into their potential pathobiological functions. Our investigation contributes to broadening our knowledge of the biological mechanisms underlying fungal pathogenesis and may assist in the identification of targets for antifungal agents. C_LI
Simmons, J. R.; Xue, T.; McCord, R. P.; Wang, J.
Show abstract
Programmed DNA elimination (PDE) is a notable exception to genome integrity, characterized by significant DNA loss during development. In many nematodes, PDE is initiated by DNA double-strand breaks (DSBs), which lead to chromosome fragmentation and subsequent DNA loss. However, the mechanism of nematode programmed DNA breakage remains largely unclear. Interestingly, in the human and pig parasitic nematode Ascaris, no conserved motif or sequence structures are present at chromosomal breakage regions (CBRs), suggesting the recognition of CBRs may be sequence-independent. Using Hi-C, we revealed that Ascaris CBRs engage in three-dimensional (3D) interactions before PDE, indicating that physical contacts between break regions may contribute to the PDE process. The 3D interactions are established in both Ascaris male and female germlines, demonstrating inherent genome organization associated with the CBRs and to-be-eliminated sequences. In contrast, in the unichromosomal horse parasite Parascaris univalens, transient pairwise interactions between neighboring CBRs that will form the ends of future somatic chromosomes were observed only during PDE. Intriguingly, we found that Ascaris PDE, which converts 24 germline chromosomes into 36 somatic ones, induces specific compartmentalization changes. Remarkably, Parascaris PDE generates the same set of 36 somatic chromosomes, and the 3D compartment changes following PDE are consistent between the two species. Overall, our findings suggest that CBRs spatially demarcate the retained and eliminated DNA and may contribute to their spatial organization during Ascaris PDE. We also demonstrated that the 3D genome reorganization of the somatic chromosomes in these nematodes following PDE is evolutionary and developmentally conserved.
Forey, R.; Raclot, C.; Dorschel, A.; Archambeau, J.; Planet, E.; Bompadre, O.; Offner, S.; Matsushima, W.; van der Goot, F. G.; Trono, D.
Show abstract
Kruppel associated box zinc finger proteins (KZFPs) form the largest family of transcriptional regulators in mammals, yet most remain uncharacterized. Here we established a scalable framework to probe KZFP function. An arrayed inducible overexpression screen of 366 human KZFPs in K562 cells identified factors that alter cellular proliferation, enabling functional prioritization. Integrative transcriptomic, chromatin and proteomic analyses revealed diverse mechanisms, including transposable element-linked repression (ZNF43), promoter proximal regulation (ZNF257), and SCAN domain dependent transcriptional activation (ZNF498/ZSCAN25 and ZNF18). These results highlight the functional diversity of KZFPs and provide a strategy for their annotation.
Hou, G.; Xu, S.; Zhao, F.; Duan, L.; Yang, H.; Li, J.; Zhou, F.; Hu, Y.; Liu, S.
Show abstract
Esophageal squamous cell carcinoma (ESCC) is still lack of clinically molecular subtyping and effective therapeutic strategies. Herein, a total of 46 paired tissue samples of esophageal squamous cell carcinoma (ESCC) were collected and subjected to a systematic proteogenomic evaluation. Consensus assessment of the ESCC-related transcriptomes and TCGA dataset revealed several consensual modes of gene expression related to ESCC specificity, with 8 plasma-detectable hub proteins that could discriminate ESCC from others. Three ESCC molecular subtypes were defined and validated based on proteome data, including pCC1 with activated immune response and best survival outcome, pCC2 as cell cycle subtype with relative worse outcome, and pCC3 with worst outcome that expressed more cell adhesion related proteins. Furthermore, we proposed potential therapeutic strategies for improving survival outcomes in patients with different ESCC molecular subtypes. This integrative proteogenomic analysis provided a novel view of ESCC-dependent molecular information.
Li, P.; Li, C.; Zhu, R.; Sun, W.; Zhou, H.; Fan, Z.; Yue, L.; Zhang, S.; Jiang, X.; Luo, Q.; Han, J.; Huang, H.; Shen, A.; Bahetibieke, T.; Wang, J.; Zhang, W.; Wen, H.; Niu, H.; Bu, C.; Zhang, Z.; Xiao, J.; Gao, R.; Chen, F.
Show abstract
Tuberculosis (TB), caused by Mycobacterium tuberculosis (MTB), has regained its position as the worlds leading killer among infectious diseases. Despite extensive research progress across epidemiology, diagnosis, drug development, treatment regimens, vaccines, drug resistance, virulence factors, and immune mechanisms, MTB-related knowledge remains fragmented across thousands of publications, limiting its effective use. To address this gap, we present MTB-KB, a literature-curated knowledgebase that systematically integrates high-impact findings from eight major sections of TB research. The current release contains 75,170 associations from 1,246 publications, covering 18,439 entities standardized using authoritative databases and WHO-endorsed classifications. A central feature is the interactive knowledge graph, which links cross-section associations to reveal and infer MTB-host interactions, treatment strategies, and vaccine development opportunities. MTB-KB also provides a user-friendly interface with browsing, advanced search, and statistical visualization. Overall, by consolidating dispersed MTB knowledge into a structured and accessible platform, MTB-KB provides a valuable resource for researchers, clinicians, and policymakers, supporting both basic and clinical TB research, enabling evidence-based TB prevention, diagnosis, and treatment, and contributing to global elimination efforts. MTB-KB is accessible at https://ngdc.cncb.ac.cn/mtbkb/.
Zheng, J.; Steinfelder, R. S.; Yin, H.; Qu, C.; Thomas, M.; Thomas, S. S.; Andrews, C.; Augusto, B.; Corley, D. C.; Lee, J. K.; Berndt, S. I.; Chan, A. T.; Chanock, S. J.; Gignoux, C.; Goldberg, S. R.; Haiman, C. A.; Huyghe, J. R.; Iwasaki, M.; Le Marchand, L.; Lee, S. C.; Melendez, J.; Mesa, I.; Ogino, S.; Sifontes, V.; Um, C. Y.; Visvanathan, K.; White, L. L.; Williams, A.; Willis, W.; Wolk, A.; Yamaji, T.; Vadaparampil, S. T.; Jarvik, G. P.; Burnett-Hartman, A. N.; Milne, R. L.; Platz, E. A.; Figueiredo, J. C.; Zheng, W.; MacInnis, R. J.; Palmer, J. R.; Schmit, S. L.; Landorp-Vogelaar, I.;
Show abstract
Colorectal cancer (CRC) is a leading cause of cancer-related death, with incidence rising substantially among individuals under 50 years of age. Polygenic risk scores (PRS) hold promise for identifying high-risk individuals; when combined with lifestyle factors, they substantially improve prediction accuracy compared with models based on lifestyle factors alone. However, few clinical tools currently exist that facilitate this integrated, PRS-enhanced risk assessment. To bridge this gap, we developed MyGeneRisk Colon, a publicly accessible web portal that delivers individualized CRC risk prediction by incorporating genetic, demographic, family history, and lifestyle factors. This paper details the development of the underlying risk prediction model, the portal's architecture and data security, our reporting framework, and engagement with a community advisory panel. Designed as a user-friendly platform, MyGeneRisk Colon aims to effectively communicate personalized CRC risk profiles and educate users and healthcare providers about prevention strategies.
Poudel, A.; Wu, Y.
Show abstract
Common bermudagrass (Cynodon dactylon) is a highly resilient and cosmopolitan grass widely used for turf, forage, and soil stabilization. Although its genome has been sequenced, little study has focused on characterizing genes underlying its resilience, including the NAC transcription factor family, which is well known for its physiological and stress-related functions. This study aimed to systematically characterize NAC TF genes in the bermudagrass genome and assess their potential roles in abiotic stress tolerance. A total of 237 CdNAC genes were identified and phylogenetically classified into 14 groups, including 40 members in the NAM/NAC1 class, which is associated with plant growth and development, and 23 members in the SNAC class, which is associated with stress responses. Tissue-specific RNA-seq analysis indicated that about one-fourth of CdNAC genes were expressed across all tissues, whereas 13 genes showed relatively higher expression in roots and 9 in inflorescence, suggesting both essential and specialized functions. Stress-responsive expression profiling revealed that 35 CdNAC genes were upregulated in response to drought, 43 to heat, 10 to salt, and 42 to submergence stress. Notably, CdNAC122, 149, and 155, the members of SNAC class, were consistently upregulated across all stress conditions, while others exhibited stress-specific expression, such as CdNAC37, 130, 145, and 199 in drought, CdNAC7, 12, 18, and 29 in heat, CdNAC46 and 151 in salt, and CdNAC9 and 31 in submergence. In contrast, 53 genes were downregulated during different stresses, with most belonging to NAM/NAC1, TERN, or OsNAC7 classes, possibly reflecting suppression of photosynthesis and development-related processes under stress. These results provide the first comprehensive characterization of CdNAC genes, reveal their distinct regulatory roles in abiotic stress responses, and establish a foundation for future functional validation and applications in breeding of stress-resilient bermudagrass.
Seckin, E.; Colinet, D.; Bailly-Bechet, M.; Seassau, A.; Bottini, S.; Sarti, E.; Danchin, E. G.
Show abstract
Orphan genes, lacking homologs in other species, are systematically found across genomes. Their presence may result from extensive divergence from pre-existing genes or from de novo gene birth, which occurs when a gene emerges from a previously non-genic region. In this study, we identified orphan genes in the genomes of globally distributed plant-parasitic nematodes of the genus Meloidogyne and investigated their origins, evolution, and characteristics. Using a comparative genomics framework across 85 nematode species, we found that 18% of Meloidogyne genes are genus-specific, transcriptionally supported orphans. By combining ancestral sequence reconstruction and synteny-based approaches, we inferred that 20% of these orphan genes originated through high divergence, while 18% likely emerged de novo. Proteomic and translatomic evidence confirmed the translation of a subset of these genes, and feature analyses revealed distinctive molecular signatures, including shorter length, signal peptide enrichment, and a tendency for extracellular localization. These findings highlight orphan genes as a substantial and previously underexplored component of the Meloidogyne genome, with potential roles in their worldwide parasitism.
Hess, F.; Chen, Y.; Lopez Ortiz, M. E.; Colliquet, A.; Stoffel-Studer, I.; Mac, V.; Grob, S.; Koelliker, R.; Studer, B.
Show abstract
Common buckwheat (Fagopyrum esculentum Moench) is a globally cultivated pseudocereal with a high nutritional quality and economic value. Due to its self-incompatibility, common buckwheat exhibits a high level of heterozygosity, making genome assembly challenging. Consequently, reference-level haplotype-resolved assemblies of common buckwheat are scarce, hindering research and genomics-assisted breeding. Here, we present a near-complete, chromosome-level, haplotype-resolved assembly of a common buckwheat F1 genotype (named Tuka), generated using a trio-binning approach that integrated parental Illumina short-read data with PacBio HiFi and Hi-C data from Tuka. The Tuka assembly comprises two haplomes, Tuka_h1 and Tuka_h2, both showing high contiguity (contig N50 of 76.68 Mb and 84.57 Mb, respectively), high completeness (assembly sizes of 1.28 Gb and 1.23 Gb with BUSCO scores of 96.9% and 96.8%, respectively), high base-level accuracy (QV of 59.08 and 63.03, respectively), and few gaps (35 and 30, respectively). This near-complete assembly of Tuka serves as a valuable genomic resource for common buckwheat, enabling advanced genomic analyses and accelerating research and breeding using state-of-the-art genomic tools.
Yu, X.; Yan, R.; Li, H.; Xie, Y.; Bi, M.; Li, Y.; Roccuzzo, A.; Tonetti, M. S.
Show abstract
Aim: To comprehensively characterize the salivary proteome in periodontitis using Orbitrap Astral data-independent acquisition mass spectrometry (DIA-MS), identify an atlas of differentially expressed proteins (DEPs), and develop a machine learning-derived multi-protein biomarker panel for non-invasive diagnosis of stage III/IV periodontitis. Materials and Methods: Unstimulated saliva samples from 199 participants (periodontal health/gingivitis, n=120; stage III/IV periodontitis, n=79) were analyzed by Orbitrap Astral DIA-MS. DEPs were identified, and pathway enrichment analysis was performed. A two-tier machine learning pipeline, integrating pathway-based feature selection with cross-validated evaluation, was applied to identify the optimal diagnostic panel. Results: Orbitrap Astral DIA-MS quantified 5,597 salivary proteins and 1,966 DEPs (|log2FC|>0.5, FDR<0.05). Pathway analysis identified 14 periodontitis-relevant KEGG pathways, including Th17 cell differentiation, IL-17 signaling, neutrophil extracellular trap formation, and complement and coagulation cascades. A four-protein panel (TEC, RAC1, MAPK14, KRT17) achieved an area under the curve (AUC) of 0.985 plus-or-minus sign 0.010, with 83% sensitivity and 100% specificity. The panel was corroborated using public datasets. Conclusions: To our knowledge, this study represents the first application of Orbitrap Astral DIA mass spectrometry in periodontitis research, establishing a disease-specific DEPs atlas and a salivary biomarker panel with high diagnostic accuracy for stage III/IV periodontitis, providing a foundation for future external validation studies.
Buyan, A.; Gazizova, G.; Zgoda, V. G.; Vavilov, N. E.; Gryzunov, N.; Eliseeva, I. A.; Nozdrin, V.; Sergeeva, Y.; Titova, A.; Shigapova, L.; Erina, A. V.; Mescheryakov, G.; Murtazina, A.; Deviatiiarov, R.; Forrest, A. R. R.; Makeev, V.; Hayashizaki, Y.; Popov, D.; Shagimardanova, E.; Kulakovskiy, I. V.; Gusev, O.
Show abstract
More than 600 distinct skeletal muscles constitute up to 40% of the total mass of the human body. Human skeletal muscles differ in anatomical position, morphology, origin, and function, but the diversity of their molecular phenotypes, the gene expression and protein abundance profiles, remains poorly explored. Here, we report the large-scale CAGE-Seq promoterome profiling of 75 human skeletal muscles, complemented by 22 matched proteomes obtained with mass spectrometry. We identified 37001 transcribed regulatory elements and 1804 protein groups encompassing 1895 proteins, 80% of which demonstrated non-uniform expression across different muscles. The skeletal muscles of the eye, tongue, and diaphragm had the most distinctive molecular phenotypes, while the overall diversity was driven by hundreds of transcription factors with tissue-specific activity. By analyzing the allelic imbalance of CAGE-Seq reads, we discovered 6653 allele-specific single-nucleotide variants often coinciding with muscle-related GWAS SNPs, including muscle volume. Finally, we provide an interactive online atlas of transcriptomic and proteomic molecular phenotypes, facilitating further studies of gene regulation and heritable pathologies of skeletal muscles.
Zhang, S.; Liu, X.; Lou, J.; Jiang, M.; He, Z.
Show abstract
Biological sequence clustering is a fundamental problem in bioinformatics, yet most existing methods mainly optimize clustering quality or efficiency while offering limited insight into why sequences are grouped together. This restricts their usefulness in downstream analysis, where representative sequences and clear cluster boundaries are often needed. To address this issue, we propose iClust, an interpretable clustering method that characterizes each cluster by a representative prototype and an adaptive radius. By adapting to local sequence structure rather than relying on a single global threshold, iClust produces clusters that are both meaningful and explainable. A final consolidation step further reduces tiny fragments and improves structural stability. Experiments on simulated and real biological sequence datasets show that iClust achieves competitive clustering performance while providing clearer cluster-level explanations than conventional threshold-based methods. In addition to its empirical impact as a practical clustering method for biological sequences, this article opens up new avenues for developing biological sequence clustering approaches from the viewpoint of interpretable machine learning.
He, Z.; Li, Y.; Shkurat, T. P.; Butenko, E. V.; Derevyanchuk, E. G.; Lomteva, S. V.; Chen, L.; Lipovich, L.
Show abstract
BackgroundPolycystic ovary syndrome (PCOS) is a prevalent endocrine disorder and a leading cause of female infertility, with complex genetic, metabolic, and hormonal etiologies. Long non-coding RNAs (lncRNAs) have emerged as important regulators of diverse biological processes, yet their roles in PCOS remain underexplored. Here, we identified and characterized PCOS differentially expressed gene-associated lncRNAs (PDEGAL) with an integrative approach combining expression data, genetic association, and evolutionary analysis. MethodsThirty-three PCOS-associated protein-coding genes were obtained from our prior study, and all their nearby and overlapping lncRNAs were annotated. These candidates were analyzed using UCSC Genome Browser-mapped annotations and datasets, including NCBI RefSeq, GENCODE, GTEx, GWAS SNPs, and conservation, as well as the FANTOM5 cap analysis of gene expression (CAGE) promoter data, to assess their expression, regulatory potential, genetic variant overlaps, and evolutionary conservation. ResultsTwenty-three PDEGALs (18 antisense to, and 5 sharing bidirectional promoters with, known PCOS-associated protein-coding genes) were identified. 17 PDEGALs contained GWAS SNPs with statistically significant disease associations, 9 of which were associated with PCOS-related traits. 5 PDEGALs demonstrated expression in the KGN granulosa cell model of PCOS. Key gene structure element (KGSE) analysis revealed that most PDEGALs are primate-specific. Integrating four criteria--GTEx expression, GWAS SNPs, FANTOM promoterome, and KGSE conservation--highlighted HELLPAR as the only lncRNA fulfilling all four, while five others--PGR-AS1, MTOR-AS1, ENSG00000265179, ENSG00000256218, and LOC105377276--fulfilled three of the four criteria. ConclusionsWe have systematically identified candidate PCOS regulatory lncRNAs with convergent genetic, expression, and evolutionary evidence. These results provide a framework for functional validation and highlight lncRNAs as potential biomarkers and therapeutic targets in PCOS that function by regulating their nearby and overlapping protein-coding genes.
Vaz Santos, M.; Schomakers, B. V.; Llobet Ayala, M.; Jamali, T.; van Weeghel, M.; van Pelt, A. M. M.; Mulder, C. L.; Hamer, G.
Show abstract
Primordial germ cells (PGCs) are the population of cells that, in the human embryo, specify day 12 post-fertilization, and form the precursor cells for the future egg or sperm cells. Although in vitro differentiation of PGCs from human stem cells has been achieved, these primordial germ cell-like cells (hPGCLCs) fail to further mature. The reason for this is unclear. Previous studies in mice revealed that several specific metabolic changes occur during the maturation of these cells, which are essential for their developmental progress. However, very little is known about the metabolic profile of human primordial germ cells. In the severe scarcity of human PGCs, hPGCLCs serve as a research model to study PGC formation. To investigate this, we differentiated hPGCLCs using induced-pluripotent stem cells and performed a mass spectrometry analysis to establish their metabolome and proteome. These cells revealed distinct metabolic profile, with changes particularly at the proteome level. This included a shift between canonical and non-canonical citric acid cycle in hPGCLC, downregulation of late-stage glycolysis and reduction of nucleotide de novo synthesis. By providing an integrative map of these metabolic networks, we aim to provide insight on the influence of metabolism on human PGC development that could help improve methods for in vitro differentiation and maturation hPGCLCs.
Wolters, F. C.; Woldu Semere, T.; Schranz, M. E.; Medema, M. H.; Bouwmeester, K.; van der Hooft, J. J. J.
Show abstract
O_LIPlants produce diverse bouquets of specialized metabolites (SMs), yet only a fraction of the vast phytochemical space has been explored to date. Comparative analysis of SM profiles can reveal hotspots of biochemical novelty, while systematic profiling across taxonomic levels does presently not cover large plant families. C_LIO_LITo study core and accessory SM profiles in the Brassicaceae plant family, we fingerprinted 14 species by Liquid-Chromatography Mass-Spectrometry (LCMS/MS). We develop standardized experimental and computational workflows integrating in silico annotation tools to study consensus compound class and substructure distributions of SMs. Furthermore, we investigate the congruence of chemotaxonomy and species phylogeny across an extended panel of 17 species. C_LIO_LIUnique metabolite profiles were outstanding in Camelina sativa, Capsella rubella, and B. vulgaris, with the largest unique terpenoid profile annotated in C. sativa, accounting for 33.5% and 55.6% in positive and negative ionization mode, respectively. Substructure motifs were found to overlap with compound class predictions, highlighted for triterpenoids in Camelinodae. Furthermore, dual-tissue chemotaxonomic clustering resembled relationships of Brassica subgenomes across tissues. C_LIO_LIWe anticipate that our systematic approach can serve as a blueprint for investigating biochemical diversity in other plant lineages and can boost the characterization of plant natural product pathways. C_LI
Gordillo-Gonzalez, F.; Galiana-Rosello, C.; Grillo-Risco, R.; Soler-Saez, I.; Hidalgo, M. R.; Siomi, H.; Kobayashi-Ishihara, M.; Garcia-Garcia, F.
Show abstract
We present a novel integrative analysis of transposable elements (TEs) in 4 single cell RNA-seq (scRNA-seq) datasets of postmortem substantia nigra pars compacta samples of Parkinson Disease (PD) patients matched healthy controls, with the objective of building a cell-type specific trustworthy atlas of TEs that may clarify the role of TEs in sex differences in PD. We have used the soloTE tool to evaluate the TEs expression changes across all snRNA-seq studies identified in our previous systematic review, and then integrated the results using meta-analysis techniques. Finally, we evaluated the possible associations between TEs and protein coding genes by integrating our previous results in this matter with the information of TEs obtained, in order to propose the possible action mechanism by which some of the TEs contribute to PD.
Tanaka, H.; Ono, E.; Segawa, T.; Murata, J.; Takagi, H.; Uegaki, Y.; Toyonaga, H.; Shiraishi, A.; Takagi, M.; Toyoda, A.; Sato, K.; Wakasugi, T.; Horikawa, M.; Kawase, M.; Itoh, T.; Yamamoto, M. P.
Show abstract
Sesame (Sesamum indicum) is one of the earliest domesticated oilseed crops and is valued for antioxidant lignans that stabilize oil quality. However, the genomic and evolutionary history of the genus Sesamum, including the origin of its allotetraploid relative S. radiatum and the diversification of lignan metabolism, remains poorly understood owing to limited chromosome-scale genomic resources. Here we present chromosome-level genome assemblies for three wild Sesamum species, two Ceratotheca species and a Japanese sesame cultivar to reconstruct genome and karyotype evolution across the Sesamum-Ceratotheca complex. Comparative analyses show that the derived x=16 lineage originated from an ancestral x=13 karyotype through chromosome fission, fusion and translocation, whereas another x=13 lineage underwent extensive restructuring associated with retrotransposon expansion. Phylogenomics places Ceratotheca within the x=16 Sesamum clade and reveals that S. radiatum originated through hybridization involving a C. sesamoides-like ancestor. The antioxidative lignan gene CYP92B14 was reintroduced via the BB progenitor, linking hybridization with restoration of oil-stabilizing metabolism during sesame evolution.